Application to movement ecology
ENSAI - CREST
https://marieetienne.github.io/MAF/
2024-12-09
$$
$$
Movement informs on internal states and habitat preferences
A continuous process sampled at some discrete, potentially irregular times.
Time series with values in \(\mathbb{R}^2\) (on Earth…).
\[ \begin{array}{|c|c|c|c|} \hline \text{Time} & \text{Location} & \text{Turning Angle} & \text{Speed} \\ \hline t_{0} & (x_0, y_0) & \text{NA} & \text{NA} \\ t_{1} & (x_1, y_1) & \text{NA} & sp_1 \\ t_{2} & (x_2, y_2) & ang_2 & sp_2 \\ \vdots & \vdots & \vdots & \vdots \\ t_{n} & (x_n, y_n) & ang_n & sp_n \\ \hline \end{array} \]
Rarely in a supervised context
Mostly unsupervised
Often analysed using discrete time model and typically Hidden Markov Model
\[\left .\begin{array}{c} Y_{1k} \sim f(\theta_{1\ell})\\ Y_{2k} \sim f(\theta_{2\ell}) \end{array} \right\rbrace \mbox{ if } k \mbox{ in region }I_{\ell}=[\tau_{\ell-1}+1,\tau_{\ell}]\]
The best segmentation for a given number of change points L: \[argmin_{0< \tau_1 < \ldots < \tau_L} C(\boldsymbol{Y}) \]
Brute force complexity : \(O(N^L)\),
Dynamic Programming algorithm complexity : \(O(N^2)\) or even less with pruning strategy
Limitation in the choice for \(f.\)
A parametric model (\(M, \boldsymbol{\theta}\)) produces \({\boldsymbol{Y}}\) and \({\boldsymbol{Z}}\).
The only observed data are \({\boldsymbol{Y}}\) while \({\boldsymbol{Z}}\) are hidden variables.
Questions are
Bayes formula is the key :
\[{\mathbb{P}}({\boldsymbol{Y}}, {\boldsymbol{Z}})={\mathbb{P}}({\boldsymbol{Y}}\vert {\boldsymbol{Z}}){\mathbb{P}}({\boldsymbol{Z}})={\mathbb{P}}({\boldsymbol{Z}}\vert {\boldsymbol{Y}}) {\mathbb{P}}({\boldsymbol{Y}})\]
Assume that the numer of states \(K\) is known
Model parameters \(\boldsymbol{\theta}=(\boldsymbol{\pi}, \boldsymbol{\gamma})\)
We could assume that
Question: If \(K=2\), what are \(\boldsymbol{\pi}\) and \(\boldsymbol{\gamma}\) ?
Couples \(\{(Y_i, Z_i)\}\) are i.i.d.
Label switching: the model is invariant for any permutation of the labels \(\{1, \dots, K\}\) \(\Rightarrow\) the mixture model has [\(K!\) equivalent definitions]{.rouge].
Distribution of a \(Y_i\): \[{\mathbb{P}}(Y_i)=\sum_{k=1}^K {\mathbb{P}}(Y_i, Z_i=k)= \class{bleu}{{\mathbb{P}}(Z_i=k)} \class{orange}{{\mathbb{P}}(Y_i | Z_i=k)}\]
Distribution of \({\boldsymbol{Y}}\):
\[\begin{align} {\mathbb{P}}({\boldsymbol{Y}}; \boldsymbol{\theta}) & = \prod_{i=1}^n \sum_{k=1}^K P(Y_i , Z_i=k;\boldsymbol{\theta}) \\ &= \prod_{i=1}^n \sum_{k=1}^K \class{bleu}{{\mathbb{P}}(Z_i=k; \boldsymbol{\pi})} \class{orange}{{\mathbb{P}}(Y_i | Z_i=k; \boldsymbol{\gamma})} \\ &= \prod_{i=1}^n \sum_{k=1}^K \class{bleu}{\pi_k} \class{orange}{f_{\gamma_k}(Y_i)} \end{align}\]
Modelling the dependence in state sequence: If an animal is feeding at time \(i\), he has more chance to be feeding at time \(i+1\) than if he was travelling at time \(i\). \[{\mathbb{P}}(Z_{i+1}=1 \vert Z_{i}=1) \ne {\mathbb{P}}(Z_{i+1}=1 \vert Z_{i}=2)\]
Markov Chain definition
\({\boldsymbol{Z}}\) is a Markov chain if \[P(Z_{i+1} \vert Z_{1:i}) = P(Z_{i+1} \vert Z_{i})\]
where \(Z_{1:i}\) stands for \(Z_{1}, \ldots, Z_i\).
\({\boldsymbol{Z}}\) is completely defined by the initial distribution \(\boldsymbol{\nu}\), \(\nu_k={\mathbb{P}}(Z_1=k)\) and the transition matrix \(\boldsymbol{\Pi}\)
Hidden States \({\boldsymbol{Z}}\) model: \({\boldsymbol{Z}}\) is assumed to follow a Markov Chain model with unknown initial distribution \(\boldsymbol{\nu}\) and transition matrix \(\boldsymbol{\Pi}\).
Observations \({\boldsymbol{Y}}\) model: The \(Y_i's\) are assumed to be independent conditionnaly on \({\boldsymbol{Z}}\) : \((Y_i\vert Z_i = k) \overset{i.i.d}{\sim} f_{\gamma_k}().\)
Model parameters are \(\boldsymbol{\theta}=(\boldsymbol{\nu}, \boldsymbol{\Pi}, \boldsymbol{\gamma})\)
Define all the key elements of a hidden Markov model with
Goal :
\[(\widehat{\boldsymbol{\gamma}},\widehat{\boldsymbol{\Pi}}, \widehat{\boldsymbol{\nu}}) = \arg\max_{\boldsymbol{\gamma}, \boldsymbol{\Pi}, {\boldsymbol{\nu}}} \log P({\boldsymbol{Y}}; \boldsymbol{\gamma}, \boldsymbol{\Pi}, \boldsymbol{\nu})\]
\[\log {\mathbb{P}}({\boldsymbol{Y}}; \boldsymbol{\theta}) = \log \sum_{k_1, k_N \in K^N }{\mathbb{P}}({\boldsymbol{Y}}, {\boldsymbol{Z}}= (k_1, \ldots,k_N); \boldsymbol{\theta})\] * No analytical estimators.
It is not always possible to compute since this sum typically involves \(K^n\) terms : \(2^{100}\approx10^{30}\)
Brute force algorithm is not the way
A convenient notation \(Z_{ik}=1\) if \(Z_i=k\)
\[\begin{align} \log {\mathbb{P}}({\boldsymbol{Y}}, {\boldsymbol{Z}}; \boldsymbol{\theta}) & = \class{bleu}{\sum_k Z_{1k} \log \nu_k }\\ &\class{bleu}{+ \sum_{i > 1} \sum_{k, \ell} Z_{i-1,k}Z_{i,\ell} \log \pi_{k\ell}} \\ & + \class{orange}{\sum_i \sum_k Z_{ik} \log f(y_i ; \gamma_k)} \end{align}\]
Idea : replace \(Z_{ik}\) by its best guess
\[\begin{align} {\mathbb{E}}\left( \log {\mathbb{P}}({\boldsymbol{Y}}, {\boldsymbol{Z}}; \boldsymbol{\theta})\vert Y_{1:N}\right) & = \class{bleufonce}{\sum_k{\mathbb{E}}\left( Z_{1k} \vert Y_{1:N}\right) \log \nu_k }\\ & \class{bleufonce}{+ \sum_{i > 1} \sum_{k, \ell} {\mathbb{E}}\left( Z_{i-1,k}Z_{i,\ell}\vert Y_{1:N}\right) \log \pi_{k\ell}} \\ & + \class{rouge}{\sum_i \sum_k {\mathbb{E}}\left(Z_{ik}\vert Y_{1:N}\right) \log f(X_i ; \gamma_k)} \end{align}\]
Expexctation Maximization (EM) algorithm
Bayes Formula
\[\begin{align} {\mathbb{P}}({\boldsymbol{Y}}, {\boldsymbol{Z}};\boldsymbol{\theta}) & = {\mathbb{P}}({\boldsymbol{Y}}\vert {\boldsymbol{Z}}; \boldsymbol{\theta}) P({\boldsymbol{Z}}; \boldsymbol{\theta}),\\ & = {\mathbb{P}}({\boldsymbol{Z}}\vert {\boldsymbol{Y}}; \boldsymbol{\theta}) {\mathbb{P}}({\boldsymbol{Y}}; \boldsymbol{\theta}). \end{align}\]
Therefore, \[\begin{align} \log {\mathbb{P}}({\boldsymbol{Y}}; \boldsymbol{\theta}) & = \log \left \lbrace {\mathbb{P}}({\boldsymbol{Y}}, {\boldsymbol{Z}};\boldsymbol{\theta}) / {\mathbb{P}}({\boldsymbol{Z}}\vert {\boldsymbol{Y}}; \boldsymbol{\theta}) \right\rbrace\\ & = \log {\mathbb{P}}({\boldsymbol{Y}}, {\boldsymbol{Z}};\boldsymbol{\theta}) - \log {\mathbb{P}}({\boldsymbol{Z}}\vert {\boldsymbol{Y}}; \boldsymbol{\theta}) \\ \end{align}\]
For a given \(\boldsymbol{\theta}_0\), we may compute \({\mathbb{P}}_{\boldsymbol{\theta}_0}=P({\boldsymbol{Z}}\vert \boldsymbol{\theta}_0, {\boldsymbol{Y}})\) and \[\begin{align} \log {\mathbb{P}}({\boldsymbol{Y}}; \boldsymbol{\theta}) &= \class{orange}{{\mathbb{E}}_{\boldsymbol{\theta}_0}(\log P({\boldsymbol{Y}}, {\boldsymbol{Z}};\boldsymbol{\theta})\vert {\boldsymbol{Y}})} - {\mathbb{E}}_{\boldsymbol{\theta}_0}(\log {\mathbb{P}}({\boldsymbol{Z}}\vert {\boldsymbol{Y}}; \boldsymbol{\theta})\vert {\boldsymbol{Y}})\\ & = \class{orange}{Q(\boldsymbol{\theta}, \boldsymbol{\theta}_0)} - H(\boldsymbol{\theta}, \boldsymbol{\theta}_0) \end{align}\]
Let’s compute it together
K=2, \(\pi_11=0.9\), \(pi_{22}=0.8\)
\[Y_i \vert Z_i = 1 \sim\mathcal{N}(2, 1)\]
\[Y_i \vert Z_i = 2 \sim\mathcal{N}(0, 1.5)\]
Forward backward recursion to compute
\[E(Z_{ik}\vert {\boldsymbol{Y}})\]
\[E(Z_{ik}Z_{j\ell}\vert {\boldsymbol{Y}})\]